R workflow integrating models in computer vision and statistical ecology: A trade-off between deep learning for species identification and inferring spatial co‐occurrenceBlabla.
Computer vision is a field of artificial intelligence in which a machine is taught how to extract and interpret the content of an image (Krizhevsky, Sutskever, and Hinton 2012). Computer vision relies on deep learning that allows computational models to learn from training data – a set of manually labelled images – and make predictions on new data – a set of unlabelled images (Baraniuk, Donoho, and Gavish 2020; LeCun, Bengio, and Hinton 2015). With the growing availability of massive data, computer vision with deep learning is being increasingly used to perform important tasks such as object detection, face recognition, action and activity recognition or human pose estimation in fields as diverse as medicine, robotics, transportation, genomics, sports and agriculture (Voulodimos et al. 2018).
In ecology in particular, there is a growing interest in deep learning for automatizing repetitive analyses on large amount of images, such as identifying plant and animal species, distinguishing individuals of the same or different species, counting individuals or detecting relevant features (Christin, Hervet, and Lecomte 2019; Lamba et al. 2019; Weinstein 2018). By saving hours of manual data analyses and tapping into massive amounts of data that keep accumulating with technological advances, deep learning has the potential to become an essential tool for ecologists and applied statisticians.
Despite the promising future of computer vision and deep learning, there are challenging issues toward their wide adoption by the community of ecologists (Wearn, Freeman, and Jacoby 2019). First, there is a programming barrier as most, if not all, algorithms are written in the Python language while most ecologists are better versed in R (Lai et al. 2019). If ecologists are to use computer vision in routine, there is a need for bridges between these two languages (through, e.g., the reticulate package Allaire et al. (2017) or the shiny package Tabak et al. (2020)). Second, most recent applications of computer vision via deep learning in ecology (short WoS review and Table?) have focused on computational aspects and simple tasks without addressing the underlying ecological questions (Sutherland et al. 2013), or carrying out the statistical data analysis (Gimenez et al. 2014). Although perfectly understandable given the challenges at hand, we argue that a better integration of the why (ecological questions), the what (data) and the how (statistics) would be beneficial to computer vision for ecology (see also Weinstein 2018). (Develop here, speak about tradeoffs, and relevance)
Here, we showcase a full why-what-how workflow in R using a case study on elucidating the structure of an ecological community (a set of co-occurring species), namely that of the Eurasian lynx (Lynx lynx) and its main preys. First, we introduce the case study and motivate the need for deep learning. Second we illustrate deep learning for the identification of animal species in large amounts of images, including model training and validation with a dataset of labelled images, and prediction with a new dataset of unlabelled images. Last, we proceed with the quantification of spatial co-occurrence using statistical models. (Main conclusion no need to go too far in the DL to get reasonable answer to ecological question) We hope that our reproducible workflow will be useful to ecologists and applied statisticians.
Lynx (Lynx lynx) went extinct in France at the end of the 19th century due to habitat degradation, human persecution and decrease in prey availability (Vandel and Stahl 2005). The species was reintroduced in Switzerland in the 1970s (Breitenmoser 1998), then re-colonised France through the Jura mountains in the 1980s (Vandel and Stahl 2005). The species is listed as endangered under the 2017 IUCN Red list and is of conservation concern in France due to habitat fragmentation, poaching and collisions with vehicles. The Jura holds the bulk of the French lynx population.
To better understand its distribution, we need to quantify its interactions with its main preys, roe deer (Capreolus capreolus) and chamois (Rupicapra rupicapra) (Molinari-Jobin et al. 2007), two ungulate species that are also hunted. To assess the relative contribution of predation and hunting, a predator-prey program was set up jointly by the French Office for Biodiversity, the Federations of Hunters from the Jura, Ain and Haute-Savoie counties and the French National Centre for Scientific Research.
Animal detections were made using a set of camera traps in the Jura mountains that were deployed in the Jura and Ain counties (see Figure 1). We divided the two study areas into grids of 2.7 \(\times\) 2.7 km cells or sites hereafter (Zimmermann et al. 2013) in which we set two camera traps per site (Xenon white flash with passive infrared trigger mechanisms, model Capture, Ambush and Attack; Cuddeback), with 18 sites in the Jura study area, and 11 in the Ain study area that were active over the study period (from February 2016 to October 2017 for the Jura county, and from February 2017 to May 2019 for the Ain county). Camera traps were checked weekly to change memory cards, batteries and to remove fresh snow after heavy snowfall.
Figure 1: Study area, grid and camera trap locations.
In total, 45563 and 18044 pictures were considered in the Jura and Ain sites respectively after manually droping empty pictures and pictures with unidentified species. We identified the species present on all images by hand (see Table 1) using digiKam a free open-source digital photo management application (https://www.digikam.org/). This operation took several weeks of labor full time, which is often identified as a limitation for camera trap studies. Computer vision with deep learning has been identified as a promising approach to expedite this tedious task (Norouzzadeh et al. 2021; Tabak et al. 2019; Willi et al. 2019).
| Species in Jura study site | n | Species in Ain study site | n |
|---|---|---|---|
| human | 31644 | human | 8931 |
| vehicule | 5637 | vehicule | 2390 |
| dog | 2779 | rider | 1206 |
| fox | 2088 | roe deer | 1101 |
| chamois | 919 | dog | 1057 |
| wild board | 522 | fox | 922 |
| badger | 401 | wild board | 643 |
| roe deer | 368 | badger | 577 |
| cat | 343 | hunter | 368 |
| lynx | 302 | lynx | 203 |
I used transfer learning to fine-tune a pre-trained CNN (resnet50) using the annotated pictures from the Jura site. Then I compared the predictions from my new model for the pictures from the Ain site with the manual annotations for these pictures. Transfer learning was achieved with GPU machines.
We use the fastai package that provides R wrappers to fastai. The fastai library simplifies training of CNNs.
Expliquer principe général et les étapes ci-dessous. Below with CPU for reproducibility, subsample of picture datasets, only a few for automatic tagging. But results are proovided with GPU, more epochs and all pictures. Fully-trained model, all pictures, provided via Zenodo.
C’est là qu’entre en jeu le deep learning, de plus en plus utilisé en écologie, voir par exemple Christin, Hervet, and Lecomte (2019). L’idée est de nourrir les algorithmes avec des photos en entrée pour en sortie récupérer l’espèce qui se trouve sur la photo. Nous avons utilisé la librairie fast-ai qui repose sur le language Python et sa librairie Pytorch. Un avantage de cette librairie est qu’elle vient avec un package R fastai qui propose plusieurs fonctions pour l’utiliser.
Quels sont les résultats obtenus? Nous avons d’abord fait du transfer learning sur un site d’étude dans le Jura où nous avions des photos déjà étiquetées. Nous avons utilisé un modèle resnet50 déjà pré-entrainé. Nous arrivons à classifier le lynx, et ses proies, le chamoix et le chevreuil, avec un degré de certitude satisfaisant.
Ensuite, nous avons utilisé le modèle pour étiqueter automatiquement des photos prises avec des pièges installés sur un autre site, dans l’Ain. Ces photos ont aussi été étiquetées à la main, on connait donc la vérité.
Sur la base du nombre de faux négatifs (une photo sur laquelle on a un lynx mais on prédit une autre espèce) et de faux positifs (une photo sur laquelle on n’a pas de lynx mais on prédit qu’il y en a un), les résultats sont peu satisfaisants. Toutefois, la question est de savoir si le manque de précision nuit à l’inférence des interactions prédateur-proie. Pour ce faire, on a utilisé des modèles statistiques qui permettent d’inférer les co-occurrences entre espèces en tenant compte de la difficulté de les détecter sur le terrain.
In this section, we analysed the data we acquired from the previous section. We formatted the camera trap data by generating monthly detection histories, that is a sequence of detections (\(Y_{sit} = 1\)) and non-detections (\(Y_{sit} = 0\)), for species \(s\) at site \(i\) and sampling occasion \(t\) (see Figure 2).
Figure 2: Detections (black) and non-detections (light grey) for each of the 3 species lynx, chamois and roe deer. Sites are on the Y axis, while sampling occasions are on the X axis.
To quantify spatial co-occurrence betwen lynx and its preys, we used a multispecies occupancy modeling approach (Rota et al. 2016) using the R package unmarked (Fiske and Chandler 2011). The multispecies occupancy model assumes that observations \(y_{sit}\), conditional on the latent occupancy state of species \(s\) at site \(i\) (\(Z_{si}\)) are drawn from Bernoulli random variables \(Y_{sit} | Z_{si} \sim \mbox{Bernoulli}(Z_{si}p_{sit})\) where \(p_{sit}\) is the detection probability of species \(s\) at site \(i\) and sampling occasion \(t\). Detection probabilities can be modeled as a function of site and/or sampling covariates, or the presence/absence of other species, but for the sake of illustration here, we will make them species-specific only.
The latent occupancy states are assumed to be distributed as multivariate Bernoulli random variables (Dai, Ding, and Wahba 2013). Let us consider on 2 species, species 1 and 2, then \(Z_i = (Z_{i1}, Z_{i2}) \sim \mbox{multivariate Bernoulli}(\psi_{11}, \psi_{10}, \psi_{01}, \psi_{00})\) where \(\psi_{11}\) is the probability that a site is occupied by both species 1 and 2, \(\psi_{10}\) the probability that a site is occupied by species 1 but not 2, \(\psi_{01}\) the probability that a site is occupied by species 2 but not 1, and \(\psi_{00}\) the probability a site is occupied by none of them. Note that we considered species-specific only occupancy probabilities but these could be modeled as site-specific covariates. Marginal occupancy probabilities are simply obtained as \(\Pr(Z_{i1}=1) = \psi_{11} + \psi_{10}\) and \(\Pr(Z_{i2}=1) = \psi_{11} + \psi_{01}\). With this model, we may also infer potential interactions by calculating conditional probabilities such as for example the probability of a site being occupied by species 2 conditional of species 1 \(\Pr(Z_{i2} = 1| Z_{i1} = 1) = \displaystyle{\frac{\psi_{11}}{\psi_{11}+\psi_{10}}}\).
Probabilités marginales en Figure 3, et probabilités de présence du lynx, conditionnellement à la présence ou absence de ses proies en Figure 4. Il y a un léger biais dans l’estimation de la probabilité de présence du lynx sachant la présence de ses deux proies favorites quand on se fie à l’étiquetage automatique des photos. A commenter en fonction des résultats du deep learning. Etant donné que les différences ne sont pas énormes, l’écologue pourra décider de les ignorer au regard du temps gagné par rapport à un étiquetage à la main. Maintenant le biais est plus important sur la probabilité de présence du lynx sachant la présence du chevreuil et l’absence du chamois qui elle est sous-estimée.
Figure 3: Marginal occupancy probabilities for all three species, lynx, roed deer and chamois). Parameter estimates are from a multispecies occupancy model using data pooled from the Jura and Ain county study sites, either with the manually-tagged Ain pictures (in red) or with the automatically-tagged Ain pictures (in blue-grey).